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cache must be dynamically populated and pruned based on the application query stream 
and access pattern. In this paper, we describe such a cache ... 

Keywords: dynamic content, e-commerce, semantic caching 



Non-volatile memory for fast , reliable file systems PH 
Mary Baker, Satoshi Asami, Etienne Deprit, John Ouseterhout, Margo Seltzer 
September 1992 ACM SXGPLAN Notices , Proceedings of the fifth international 
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conference on Architectural support for programming languages and 
operating systems, volume 27 issue 9 
Full text available: ^ pdf( 1 .47 MB) Additional Information: full citation , references , citings, index terms 



O ptimizin g method search with lookup caches and incremental coloring 0 
Pascal Andre, Jean-Claude Royer 

October 1992 ACM SXGPLAN Notices , conference proceedings on Object-oriented 
programming systems, languages, and applications, volume 27 issue 10 
Full text available: ^ pdf(1.70 MB) Additional Information: full citation , references , citings , index terms 



Keywords: Smalltalk-80, coloring, efficiency, lookup caches, method search, object- 
oriented languages with classes, statistics 



6 Session S4.1: power in memory and network processors: An integrated approach to [ | 

reducing power dissi p ation in memory hierarchies 
Jayaprakash Pisharath, Alok Choudhary 

October 2002 Proceedings of the international conference on Compilers, architecture, 

and synthesis for embedded systems 
Full text available: ^ pdf(295.32 KB) Additional Information: full citation , abstract , references , index terms 

In recent years, both performance and power have become key factors in efficient memory 
design. In this paper, we propose a systematic approach to reduce the energy consumption 
of the entire memory hierarchy. We first evaluate an existing power-aware memory system 
where memory modules can exist in different power modes, and then propose on-chip 
memory module buffers, called Energy-Saver Buffers (ESB), which reside in-between the L2 
cache and main memory. ESBs reduce the additional overhead incur ... 

Keywords: RDRAM, dynamic cache, energy-delay product, energy-saver buffers (ESB), 
integrated approach, power 



7 OMPI: optimizing MPI programs using partial evaluation Q 
Hirotaka Ogawa, Satoshi Matsuoka 

November 1996 Proceedings off the 1996 ACM/IEEE conference on Supercomputing 
(CDROM) 

Full text available: ^ pdfd 38.70 KB) Additional Information: full citation , abstract , references , index terms 

MPI is gaining acceptance as a standard for message-passing in high-performance 
computing, due to its powerful and flexible support of various communication styles. 
However, the complexity of its API poses significant software overhead, and as a result, 
applicability of MPI has been restricted to rather regular, coarse-grained computations. Our 
OMPI (Optimizing MPI) system removes much of the excess overhead by employing partial 
evaluation techniques, which exploit static information of MPI ... 

8 Modeling Rate-Based Dynamic Cache Sharing for Distributed VOD Systems Q 
B. Sonah, M. R. Ito 

March 2000 Proceedings of the The International Conference on Information 

Technology: Coding and Computing (ITCC'OO) 
Full text available: Pub|jsher Sjte Additional Information: full citation , abstract 

A distributed VOD system includes several VOD subsystems, each VOD sub-system 
consisting of an archive server (AS), a continuous media server (CMS) and a medata DB. A 
VOD sub-system employs an object replacement algorithm by which a video is selected to 
be replaced by a new video. Upon a miss, the VOD system must decide onto which CMS to 
load the new video.In this paper, we address this issue by modeling a dynamic approach of 
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logically sharing the overall CMS cache space among the CMS's based on ... 

9 A dynamic cache sub-block desi g n to reduce false sharing | | 

Murali Kadiyala, Laxmi N. Bhuyan 

October 1995 Proceedings of the 1995 International! Conference on Computer Design: 

VLSI in Computers and Processors 
Full text available: ^ PubIjsher Sjte Additional Information: full citation , abstract 

Parallel applications differ from significant bus traffic due to the transfer of shared data. 
Large block sizes exploit locality and decrease the effective memory access time. It also has 
a tendency to group data together even though only a part of it is needed by any one 
processor. This is known as the false sharing problem. This research presents a dynamic 
sub-block coherence protocol which minimizes false sharing by trying to dynamically locate 
the point of false reference. Sharing traffic is ... 

Keywords: bus traffic, cache storage, dynamic cache sub-block design, dynamic sub-block 
coherence protocol, false sharing, memory architecture, memory protocols, simulation 
results 



10 Using dynamic cache mana g ement techniques to reduce energy in a hig h- performance | | 
processor 

Nikolaos Bellas, Ibrahim Hajj, Constantine Polychronopoulos 

August 1999 Proceedings of the 1999 international symposium on Low power 

electronics and design 
Full text available: c g] pdf(746.5Q KB) Additional Information: full citation , citings , index terms 



11 Dynamic cache partitioning for vector and matrix computations | | 

Dana Mark Madsen 

January 1997 Doctoral Thesis, Cornell University 
Additional Information: full citation , index terms 



Keywords: vector computations 



12 Load execution latency reduction [Z] 
Bryan Black, Brian Mueller, Stephanie Postal, Ryan Rakvic, Noppanunt Utamaphethai, John 
Paul Shen 

July 1998 Proceedings of the 12th international conference on Supercomputing 
Full text available: ^ pdf(1.10 MB) Additional Information: full citation , references , citin gs, index terms 



Keywords: load address prediction, load execution, load/store alias, speculative execution, 
value prediction 

13 Page placement algorithms for large real-indexed caches | | 

R. E. Kessler, Mark D. Hill 

November 1992 ACM Transactions on Computer Systems (TOCS), volume 10 issue 4 

Full text available* c nEJ odfd 55 MB) Additional Information: full citation , abstract , references , citings , index 
' ^ terms 

When a computer system supports both paged virtual memory and large real-indexed 
caches, cache performance depends in part on the main memory page placement. To date, 
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most operating systems place pages by selecting an arbitrary page frame from a pool of 
page frames that have been made available by the page replacement algorithm. We give a 
simple model that shows that this naive (arbitrary) page placement leads to up to 30% 
unnecessary cache conflicts. We develop several page placement algor ... 



1 4 Routing I: Novel architectures for P2P applications: the continuous-discrete approach | | 
Moni Naor, Udi Wieder 

June 2003 Proceedings of the fifteenth annual AC [Ml symposium on Parallel algorithms 
and architectures 

Full text available: ^ pdf(260.13 KB) Additional Information: full citation , abstract , references , index terms 

We propose a new approach for constructing P2P networks based on a dynamic 
decomposition of a continuous space into cells corresponding to processors. We demonstrate 
the power of these design rules by suggesting two new architectures, one for DHT 
(Distributed Hash Table) and the other for dynamic expander networks. The DHT network, 
which we call Distance Halving allows logarithmic routing and load, while preserving 
constant degrees. It offers an optimal tradeoff between the degree and the dilati ... 

Keywords: distributed systems, fault tolerance, hash tables, peer-to-peer 



15 Pursuin g the Perfor mance Potential of Dynamic Cache Line Sizes 

October 1999 Proceedings off the 2L999 IEEE Emternafcionall Conference on Computer 
Design 

Full text available: Pub | jsher Sjte Additional Information: full citation , abstract 

In this paper we examine the application of offline algorithms for determining the optimal 
sequence of loads and superloads (a load of multiple consecutive cache lines) for direct- 
mapped caches. We evaluate potential gains in terms of miss rate and bandwidth and find 
that in many cases optimal superloading can noticeably reduce the miss rate without 
appreciably increasing bandwidth. Then we examine how this performance potential might 
be realized. We examine the effectiveness of a dynamic online ... 

Keywords: Cache Performance, Line Size, Optimal Algorithm, Prediction, Profiling 
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16 Dynamic Caching of Query Results for Decision Support Systems | | 

Junho Shim, Peter Scheuermann, Radek Vingralek 

July 1999 Proceedings of the 11th International Conference on Scientific on Scientific 

and Statistical Database Management 
Full text available: ^ Pub[jsher Sjte Additional Information: full citation , abstract 

The response time of DSS (Decision Support System) queries is typically several orders of 
magnitude higher than the response time of OLTP (OnLine Transaction Processing) queries. 
Since DSS queries are often submitted interactively, techniques for reducing their response 
time are becoming increasingly important. We argue that caching of query results is one 
such technique particularly well suited to the DSS environment. We have designed a query 
cache manager for such an environment. The cache man ... 



17 Memory System Support for Dynamic Cache Line Assembly | | 

Lixin Zhang, Venkata K. Pingali, Bharat Chandramouli, John B. Carter 

November 2000 Revosed Papers from the Second Imterinialtiioinial Workshop on Iimttellligenfc 

Memory Systems 
Additional Information: full citation 
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Nikolaos E. Bellas, Ibrahim IN. Hajj, Constantine D. Polychronopoulos 

December 2000 IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 

Volume 8 Issue 6 
Additional Information: full citation , index terms 



Keywords: low-power-design, memory, performance-trade-offs, system-level 

19 Processor-based system: Tuning of loop cache architectures to programs in embedded | | 
system desi g n 
Susan Cotterell, Frank Vahid 

October 2002 Proceedings off the 15th international symposium on System Synthesis 
Full text available: *Q pdf(58.95 KB ) Additional Information: full citation , abstract , references , index terms 

Adding a small loop cache to a microprocessor has been shown to reduce average 
instruction fetch energy for various sets of embedded system applications. With the advent 
of core-based design, embedded system designers can now tune a loop cache architecture 
to best match a specific application. We developed an automated simulation environment to 
find the best loop cache architecture for a given application and technology. Using this 
environment, we show significant variation in the best architect ... 

Keywords: architecture tuning, cores, customized architectures, embedded systems, filter 
cache, instruction fetching, loop cache, low energy, low power, memory hierarchy, 
synthesis, tuning 



20 An Optimal Cache for a Federated Database System | | 

Alfredo Goni, Arantza Illarramendi, Eduardo Mena, Jose Miguel Blanco 
September 1997 Journal of Intelligent Information Systems, volume 9 issue 2 

Full text available: (^j p ub | jsher Sjte Additional Information: full citation , abstract , index terms 

Federated database systems allow users to query different autonomous databases with a 
single request. The answer to those requests must be found on the underlying databases. 
This answering process can be improved if some data are cached within the federated 
database system. The article presents an approach that allows the definition of an optimal 
cache for a federated database system according to a set of parameters. We show the types 
of objects to be cached, the cost model used to dec ... 

Keywords: caching techniques, description logics, federated databases, query processing 
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21 Session 5: Fine-grain CAM-tag cache resizing usin g miss tags 
Michael Zhang, Krste Asanovic 

August 2002 Proceedings of the 2002 international symposium on Low power 

electronics and design 
Full text available: ^ pdf(220.91 KB) Additional Information: full citation , abstract , references , index terms 

A new dynamic cache resizing scheme for low-power CAM-tag caches is introduced. A 
control algorithm that is only activated on cache misses uses a duplicate set of tags, the 
miss tags, to minimize active cache size while sustaining close to the same hit rate as a full 
size cache. The cache partitioning mechanism saves both switching and leakage energy in 
unused partitions with little impact on cycle time. Simulation results show that the scheme 
saves 28—56% of data cache energy and 34—49 ... 

Keywords: cache resizing, content-addressable-memory, energy efficiency, leakage 
current, low-power 



22 An accurate and efficient performance analysis technique for multiprocessor sno oping | | 

cache-consistency protocols 

M. K. Vernon, E. D. Lazowska, J. Zahorjan 

May 1988 ACM SK3ARC1HI Computer Architecture News , Proceedings off the 2.5th 

Annual International Symposium on Computer architecture, volume 16 issue 2 

Full text available* S df(999 88 KB) Additional Information: f u ll c it a ti on , abstract, re f erences , citin gs, in de x 
u ex avai a e.^&_a — : terms 

A number of dynamic cache consistency protocols have been developed for multiprocessors 
having a shared bus interconnect between processors and shared memory. The relative 
performance of these protocols has been studied extensively using simulation and detailed 
analytical models based on Markov chain techniques. Both of these approaches use 
relatively detailed models, which capture cache and bus interference rather precisely, but 
which are highly expensive to evaluate. In this paper, we inv ... 

23 Disk cache — miss ratio analysis and desi g n considerations [Z] 
Alan J. Smith 

August 1985 ACM Transactions on Computer Systems (TOCS), volume 3 issue 3 

Full text available* -SJ df(3 13 MB) Additional Information: full citation , abstract , references , citings , index 
* \&A = terms , review 

The current trend of computer system technology is toward CPUs with rapidly increasing 
processing power and toward disk drives of rapidly increasing density, but with disk 
performance increasing very slowly if at all. The implication of these trends is that at some 
point the processing power of computer systems will be limited by the throughput of the 
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input/output (I/O) system. A solution to this problem, which is described and evaluated in 
this paper, is disk cache 

24 Predicting data cache misses in non-numeric applications through correlation profiling | | 
Todd C. Mowry, Chi-Keung Luk 

December 1997 Proceedings of the 30th annual ACM/KEEE international symposium on 
Ml i c roa rc h i tect u re 

Full text available: 'gj pdf(876.36 KB) Additional Information: full citation , abstract , references , citings, index 
I Publisher Site 

To maximize the benefit and minimize the overhead of software-based latency tolerance 
techniques, we would like to apply them precisely to the set of dynamic references that 
suffer cache misses. Unfortunately, the information provided by the state-of-the-art cache 
miss profiling technique (summary profiling) is inadequate for references with intermediate 
miss ratios - it results in either failing to hide latency, or else inserting unnecessary 
overhead. To overcome this problem, we propose and ev ... 

Keywords: profiling, cache miss prediction, correlation, non-numeric applications, latency 
tolerance. 



25 Compiler-directed cache polymorphism Q 
j. S. Hu, M. Kandemir, N. Vijaykrishnan, M. J. Irwin, H. Saputra, W. Zhang 
June 2002 ACM SHGPLAN Notices , Proceedings of the joint conference on (Languages, 

compilers and tools for embedded systems: software and compilers for 

embedded systems, volume 37 issue 7 
Full text available:^ pdf(41 9.50 KB) Additional Information: full citation , abstract , references , index terms 

Classical compiler optimizations assume a fixed cache architecture and modify the program 
to take best advantage of it. In some cases, this may not be the best strategy because each 
loop nest might work best with a different cache configuration and transforming a nest for a 
given fixed cache configuration may not be possible due to data dependences. Working with 
a fixed cache configuration can also increase energy consumption in loops where the best 
required configuration is smaller than the def ... 

Keywords: cache locality, cache polymorphism, compilers, data reuse, embedded software, 
energy consumption 



26 Power-and Ener g y-Aware Com p utin g : Ener g y-eff ici ent ins t ruction cache usin g page- | ] 
based placement 

S. Kim, N. Vijaykrishnan, M. Kandemir, M. j. Irwin 

November 2001 Proceedings of the international conference on Compilers, architecture, 

and synthesis for embedded systems 
Full text available: ^ pdf(1 85.58 KB) Additional Information: full citation , abstract , reference s, citings 

Energy consumption is a crucial factor in designing battery-operated embedded and mobile 
systems. The memory system is a major contributor to the system energy in such 
environments. In order to optimize energy and energy-delay in the memory system, we 
investigate ways of splitting the instruction cache into several smaller units, each of which is 
a cache by itself (called subcache). The subcache architecture employs a page-based 
placement strategy, a dynamic cache line remapping policy an ... 

27 Memory hierarchy reconfiguration for energy and performance in general-purpose [ | 
processor architectures 

Rajeev Balasubramonian, David Albonesi, Alper Buyuktosunoglu, Sandhya Dwarkadas 

December 2000 Proceedings of the 33rd annual ACMI/XEEE international symposium on 

Microarchitecture 

Full text available: 1 ^ Pdfd 55.56 KB) ajj .. (1X ...... 

r£ /-oo oo ,/o\ Additional Information: full citation , references , citings , index terms 
m ps (663.39 KB ) — 
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28 Using a knowledge cache for interactive discovery of association rules | | 

Biswadeep Nag, Prasad M. Deshpande, David J. DeWitt 

August 1999 Proceedings of the fifth ACM international conference on 

Knowledge discovery and data mining 
Full text available: ^ pdf(1.26 MB) Additional Information: full citation , references , citings, index terms 



29 Closing the window of vulnerability in multiphase memory transactions | | 
John Kubiatowicz, David Chaiken, Anant Agarwal 

September 1992 ACM SK3PLAN Notices , Proceedings of the fifth international 

conference on Architectural support for programming langyages and 
operating systems, volume 27 issue 9 

Full text available* ^ pdf(1 37 MB) Additional Information: full citation , ab s t ract , referenc es, citings, in dex 
u v i • L_: terms 

Multiprocessor architects have begun to explore several mechanisms such as prefetching, 
context-switching and software-assisted dynamic cache-coherence, which transform single- 
phase memory transactions in conventional memory systems into multiphase operations. 
Multiphase operations introduce a window of vulnerability in which data can be invalidated 
before it is used. Losing data due to invalidations introduces damaging livelock situations. 
This paper discusses the origins ... 

30 Caches versus ob ject allocatio n HZ] 
J. Liedtke 

October 1996 Proceedings of the 5th International Workshop on Object Orientation on 
Operating Systems (EWOOOS '96) 

Full text available: ^ Pub | jsher Sjte Additional Information: full citation , abstract 

Dynamic object allocation usually stresses the randomness of data memory usage; the 
variables of a dynamic cache working set are to some degree distributed stochastically in 
the virtual or physical address space. This interferes with cache architectures, since, 
currently, most of them are highly sensitive to access patterns. In the above mentioned 
stochastically distributed case, the true capacity is far below the cache size and largely 
differs from processor to processor. As a consequence, obje ... 

Keywords: access patterns, cache architectures, data memory usage, dynamic cache 
working set, dynamic object allocation, memory management techniques, object allocation 
schemes, object-oriented programming, physical address space 



31 On Request Forwarding for Dynamic Web Caching Hierarchies 
Cho-Yu Chiang, Yingjie Li, Ming T. Liu, Mervin E. Muller 

April 2000 Proceedings of the The 20th international Conference on Distributed 

Computing Systems ( ECBCS 2000) 
Full text available: Pub)jsher Sjte Additional Information: full citation , abstract 

Based on Caching Neighborhood Protocol (CNP), we proposed a Web caching scheme 
featuring dynamic caching hierarchies as its underlying infrastructure [1], Dynamic Web 
caching hierarchies consist of proxy servers building hierarchies on a per request basis, in 
contrast to static Web caching hierarchies that comprise proxy servers preconfigured into 
hierarchies. Concerns on overheads and efficiency in forwarding requests individually drove 
conventional Web caching schemes to use static Web caching ... 

32 Morphable Cache Architectures: Potential Benefits 

I. Kadayif, M. Kandemir, N. Vijaykrishnan, M. J. Irwin, J. Ramanujam 
August 2001 ACM SH3PLAW Notices, volume 36 issue 8 
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Computer architects have tried to mitigate the consequences of high memory latencies using 
a variety techniques. An example of these techniques is multi-level caches to counteract the 
latency that results from having a memory that is slower than the processor. Recent 
research has demonstrated that compiler optimizations that modify data layouts and 
restructure computation can be successful in improving memory system performance. 
However, in many cases, working with a fixed cache configuration ... 



33 Dynamic services and analysis: Engineering and hosting adaptive freshness-sensitive | | 
web application s on data centers 

Wen-Syan Li, Oliver Po, Wang-Pin Hsiung, K. Selguk Candan, Divyakant Agrawal 

May 2003 Proceedings off the twelfth international conference on World Wide Web 

Full text available:*^ pdf(1 0.31 MB) Additional Information: full citation , abstract , references , index terms 

Wide-area database replication technologies and the availability of content delivery networks 
allow Web applications to be hosted and served from powerful data centers. This form of 
application support requires a complete Web application suite to be distributed along with 
the database replicas. A major advantage of this approach is that dynamic content is served 
from locations closer to users, leading into reduced network latency and fast response 
times. However, this is achieved at the expense ... 

Keywords: database-driven web applications, dynamic content, freshness, response time, 
net-work latency, web acceleration 



Synthesis of customized loop caches for core-based embedded systems 
Susan Cotterell, Frank Vahid 

November 2002 Proceed imigs of the 2002 EEEE/ACM ontermatiiointal conference on 

Computer-aided design 
Full text available: E g) pdf(92.19 KB) Additional Information: full citation , abstract , references 

Embedded system programs tend to spend much time in small loops. Introducing a very 
small loop cache into the instruction memory hierarchy has thus been shown to substantially 
reduce instruction fetch energy. However, loop caches come in many sizes and variations - 
using the configuration best on the average may actually result in worsened energy for a 
specific program. We therefore introduce a loop cache exploration tool that analyzes a 
particular program's profile, rapidly explores the possib ... 

Keywords: architecture tuning, customized architectures, embedded systems, estimation, 
instruction fetching, loop cache, low energy, low power, memory hierarchy, synthesis, 
tuning 
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35 A locality-preservin g cache-oblivious dynamic dictionary 
Michael A. Bender, Ziyang Duan, John Iacono, Jing Wu 

January 2002 Proceed iimgs off the thirteenth annual ACM-SEAM symposium on Discrete 
algorithms 

Full text available: ^ pdfd.oe MB) Additional Information: full citation , abstract 

This paper presents a simple dictionary structure designed for a hierarchical memory. The 
proposed data structure is cache oblivious and locality preserving. A cache-oblivious data 
structure has memory performance optimized for all levels of the memory hierarchy even 
though it has no memory-hierarchy-specific parameterization. A locality-preserving 
dictionary maintains elements of similar key values stored close together for fast access to 
ranges of data with consecutive keys.The d ... 
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Panos Kalnis, Dimitris Papadias 

May 2001 ACM SEGMOD Record , Proceedings of the 2003. ACM SIGMOID international 

conference on Management of data, volume 30 issue 2 

p ■■ . . .. . . h^c, ™ Additional Information: full citation , abstract , references , citings , index 

Full text available: Taj pdf(215.70 KB) s *-' 
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Data warehouses have been successfully employed for assisting decision making by offering 
a global view of the enterprise data and providing mechanisms for On-Line Analytical 
processing. Traditionally, data warehouses are utilized within the limits of an enterprise or 
organization. The growth of Internet and WWW however, has created new opportunities for 
data sharing among ad-hoc, geographically spanned and possibly mobile users. Since it is 
impractical for each enterprise to set up a worldwi ... 

37 Cache investment: integrating query optimization and distributed data placement | | 
Donald Kossmann, Michael J. Franklin, Gerhard Drasch, Wig Ag 
December 2000 ACM Transactions on Database Systems (TODS), volume 25 issue 4 

Full text available* - ^|pdf(210 67 KB) Additional Information: full citation , abstract , references , citings , index 

terms 

Emerging distributed query-processing systems support flexible execution strategies in 
which each query can be run using a combination of data shipping and query shipping. As in 
any distributed environment, these systems can obtain tremendous performance and 
availability benefits by employing dynamic data caching. When flexible execution and 
dynamic caching are combined, however, a circular dependency arises: Caching occurs as a 
by-product of query operator placement, but query operator pi ... 

Keywords: cache investment, caching, client-server database systems, data shipping, 
dynamic data placement, query optimization, query shipping 



38 Ap plication-specific memory management for embedded systems using software- [ [ 
controlled caches 

Derek Chiou, Prabhat Jain, Larry Rudolph, Srinivas Devadas 

June 2000 [Proceedings of the 37th conference on Design automation 

Full text available* ^ Ddf(76 30 KB) Additional Information: full citation , abstract , references , citings , index 

t erms 

We propose a way to improve the performance of embedded processors running data- 
intensive applications by allowing software to allocate on-chip memory on an application- 
specific basis. On-chip memory in the form of cache can be made to act like scratch-pad 
memory via a novel hardware mechanism, which we call column caching. Column caching 
enables dynamic cache partitioning in software, by mapping data regions to a specified sets 
of cache "columns" or "ways ... 

39 Linux Vs. Windows NT and OS/2 □ 
Bernie Thompson 

January 1994 Linux Journal 

Full text available: [g| html(16.27 KB ) Additional Information: full citation , abstract , index terms 

We continue to see media blurbs and ads for both Microsoft's Windows NT and IBM's OS/2. 
Both promise to be the operating system that we need and to take advantage of the Intel 
386 and beyond 

40 Cachin g multidimensional queries using chunks | | 
Prasad M. Deshpande, Karthikeyan Ramasamy, Amit Shukla, Jeffrey F. Naughton 

June 1998 ACM SIGMOID Record , Proceedings of the 1998 ACM SM3MOD international 
conference on Management of data, volume 27 issue 2 

Full text available: ffl pdf(1.55 MB ) Additional Information: full citation, abstract, references , citings, index 
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Caching has been proposed (and implemented) by OLAP systems in order to reduce 
response times for multidimensional queries. Previous work on such caching has considered 
table level caching and query level caching. Table level caching is more suitable for static 
schemes. On the other hand, query level caching can be used in dynamic schemes, but is 
too coarse for "large" query results. Query level caching has the further drawback for small 
query results in that it is only effectiv ... 
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41 Informing memory o perations: memor y performance feed back m ec hanisms and their | | 



applications 

Mark Horowitz, Margaret Martonosi, Todd C. Mowry, Michael D. Smith 

May 1998 ACM Transactions on Computer Systems (TOCS), volume 16 issue 2 

Additional Information: full citation , abstract , references , index terms . 
review 



Full text available: t gpdf(344.74 KB) 



Memory latency is an important bottleneck in system performance that cannot be 
adequately solved by hardware alone. Several promising software techniques have been 
shown to address this problem successfully in specific situations. However, the generality of 
these software approaches has been limited because current architectures do not provide a 
fine-grained, low-overhead mechanism for observing and reacting to memory behavior 
directly. To fill this need, this article proposes a new class ... 



Keywords: cache miss notification, memory latency, processor architecture 



42 Adaptive data prefetching using cache information Q 
Ando Ki, Alan E. Knowles 

July 1997 Proceedings of the 11th international conference on Supercoimputing 
Full text available: ^ pdf(1.89 MB) Additional Information: full citation , references , citings , index terms 



43 Transactional client-server cache consistency: alternatives and performance [Z] 
Michael J. Franklin, Michael J. Carey, Miron Livny 

September 1997 ACM Transactions on Database Systems (TODS), Volume 22 issue 3 

Full text available* ffl pdf(452 41 KB) Additional Information: full citation , abstract , references , citings , index 

terms , review 

Client-server database systems based on a data shipping model can exploit client memory 
resources by caching copies of data items across transaction boundaries. Caching reduces 
the need to obtain data from servers or other sites on the network. In order to ensure that 
such caching does not result in the violation of transaction semantics, a transactional cache 
consistency maintenance algorithm is required. Many such algorithms have been proposed 
in the literature and, as all provide the sam ... 

44 Stack caching for interpreters | | 
M. Anton Ertl 

June 1995 ACM SXGPLAN Notices , (Proceedings of the ACM SEGPLAN 1995 conference 
on Programming language design and implementation, volume 30 issue 6 
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An interpreter can spend a significant part of its execution time on accessing arguments of 
virtual machine instructions. This paper explores two methods to reduce this overhead for 
virtual stack machines by caching top-of-stack values in (real machine) registers. The 
dynamic method is based on having, for every possible state of the cache, one specialized 
version of the whole interpreter; the execution of an instruction usually changes the state of 
the cache and the next i ... 
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1 Tu ning g arba g e co l le cti on for r educin g memor y system energy in an embedded j ava 
environment 

G. Chen, R. Shetty, M. Kandemir, N. Vijaykrishnan, M. J. Irwin, M. Wolczko 
November 2002 ACM Transactions on Embedded Computing Systems (TECS), Volume l 
Issue 1 

Full text available: t g] pdf(740.23 KB) Additional Information: full citation , abstract , references , index terms 

Java has been widely adopted as one of the software platforms for the seamless integration 
of diverse computing devices. Over the last year, there has been great momentum in 
adopting Java technology in devices such as cellphones, PDAs, and pagers where optimizing 
energy consumption is critical. Since, traditionally, the Java virtual machine (JVM), the 
cornerstone of Java technology, is tuned for performance, taking into account energy 
consumption requires reevaluation, and possibly redesign oft ... 

Keywords: Garbage collector, Java Virtual Machine (JVM), K Virtual Machine (KVM), low 
power computing 



Concurrent garbage collection using hardware-assisted profiling 
Timothy H. Heil, James E. Smith 

October 2000 ACM S3EGPLAN Notices , (Proceedings of the second international 

symposium on Memory management, volume 36 issue i 
Full text available: ^ pdf(1.74 MB) Additional Information: full citation , abstract, citin gs, index terms 

In the presence of on-chip multithreading, a Virtual Machine (VM) implementation can 
readily take advantage of service threads for enhancing performance by performing tasks 
such as profile collection and analysis, dynamic optimization, and garbage collection 
concurrently with program execution. In this context, a hardware-assisted profiling 
mechanism is proposed. The Relational Profiling Architecture (RPA) is designed from the top 
down. RPA is based on a relational model similar ... 

Garbage collection for a client-server persistent object store ^Zl 
Laurent Amsaleg, Michael J. Franklin, Olivier Gruber 

August 1999 ACM Transactions on Computer Systems (TOCS), volume 17 issue 3 

Full text available* E n3 pdf(267 18 KB) Additional Information: full citation , abstract , references , citings , index 

: terms , review 

We describe an efficient server-based algorithm for garbage collecting persistent object 
stores in a client-server environmnet. The algorithm is incremental and runs concurrently 
with client transactions. Unlike previous algorithms, it does not hold any transactional locks 
on data and does non require callbacks to clients. It is fault- tolerant, but performs very little 
logging. The algorithm has been designed to be integrated into existing systems, and 
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4 The measured cost of copying garbage collection mechanisms 
Michael W. Hicks, Jonathan T. Moore, Scott M. Nettles 

August 1997 ACM S1GPL&N Notices , Proceedings off the second ACM S3CGPLAW 

international conference on Functional programming, volume 32 issue 8 
Full text available: ^ pdf(1.65 MB) Additional Information: full citation , abstract , references , index terms 

We examine the costs and benefits of a variety of copying garbage collection (GC) 
mechanisms across multiple architectures and programming languages. Our study covers 
both low-level object representation and copying issues as well as the mechanisms needed 
to support more advanced techniques such as generational collection, large object spaces, 
and type segregated areas.Our experiments are made possible by a novel performance 
analysis tool, Oscar. Oscar allows us to capture snapshots of pr ... 

5 Garbage collection in obj ect-oriented databases usin g transactional cyclic reference 
countin g 

P. Roy, S. Seshadri, A. Silberschatz, S. Sudarshan, S. Ashwin 

August 1998 The VLDB Journal — The International Journal on Very Large Data Bases, 
Volume 7 Issue 3 

Full text available: ^ pdf ( 1 80.00 KB) Additional Information: full citation , abstract 

Garbage collection is important in object-oriented databases to free the programmer from 
explicitly deallocating memory. In this paper, we present a garbage collection algorithm, 
called Transactional Cyclic Reference Counting (TCRC), for object-oriented databases. The 
algorithm is based on a variant of a reference-counting algorithm proposed for functional 
programming languages The algorithm keeps track of auxiliary reference count information 
to detect and collect cyclic garbage. The algorithm ... 

6 Memory subsystem performance of prog rams usin g copying g arba g e collection 
Amer Diwan, David Tarditi, Eliot Moss 

February 1994 Proceedings of the 21st ACM SXGPLAN-SIGACT symposium on Principles 

of programming languages 
Full text available* -ff!] pdff1.28 MB) Additional Information: full citation , abstract , references , citings , index 

. i£] h terms , review 

Heap allocation with copying garbage collection is believed to have poor memory subsystem 
performance. We conducted a study of the memory subsystem performance of heap 
allocation for memory subsystems found on many machines. We found that many machines 
support heap allocation poorly. However, with the appropriate memory subsystem 
organization, heap allocation can have good memory subsystem performance. 
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Cache performance of g arba g e-collected pro g rams 
Mark B. Reinhold 

June 1994 ACM SIGPLAN Notices , Proceedings of the ACM SXGPLAN 1994 conference 

on Programming langyage design and implementation, volume 29 issue 6 
Full text available: fiBpdf(1.46 MB) Additional Information: full citation, abstract , references , citings, index 
^ terms 

As processor speeds continue to improve relative to main-memory access times, cache 
performance is becoming an increasingly important component of program performance. 
Prior work on the cache performance of garbage-collected programs either argues or 
assumes that conventional garbage-collection methods will yield poor performance, and has 
therefore concentrated on new collection algorithms designed specifically to improve cache- 
level reference locality. This paper argues to the c ... 

Caching considerations for g enerational garbage collection 
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Paul R. Wilson, Michael S. Lam, Thomas G. Moher 

January 1992 ACM SEGPLAN Lisp Pointers , Proceedings off the 1992 ACM conference on 

LISP and functional! programming, volume v issue i 
Full text available: ^ pdf(1.09 MB) Additional Information: full citation , references , citings , index terms 



Concurrent compacting garbage collection of a persistent heap ^ 
James O'Toole, Scott Nettles, David Gifford 

December 1993 ACM SEGOPS Operating Systems Review , Proceedings off the 

fourteenth ACM symposium on Operating systems principles, volume 27 
Issue 5 

Full text available: c jgl] pdf(1.50 MB) Additional Information: full citation , references , citings, index terms 



1 0 Partitioned g arbage collection of a large ob ject store 
Umesh Maheshwari, Barbara Liskov 

June 1997 ACM SEGMOD Record , Proceedings off the 1997 ACM SIGMOD international 

conference on Management off data, volume 26 issue 2 
Full text available- t n£l pdf(1.37 MB) Additional Information: full citation , abstract , references , citings , index 

terms 

We present new techniques for efficient garbage collection in a large persistent object store. 
The store is divided into partitions that are collected independently using information about 
inter-partition references. This information is maintained on disk so that it can be recovered 
after a crash. We use new techniques to organize and update this information while avoiding 
disk accesses. We also present a new global marking scheme to collect cyclic garbage across 
partitions. Global marking ... 

Keywords: cyclic garbage, garbage collection, object database, partitions 



11 Garbage collecting t he I nternet: a survey of distributed garbage collection ^ 
Saleh E. Abdullahi, Graem A. Ringwood 

September 1998 ACM Computing Surveys (CSUR), volume 30 issue 3 

Full text available* pdf(337 65 KB) Additiona l Information: full citation , abstract, references , citings , index 
^ 1 terms , review 

Internet programming languages such as Java present new challenges to garbage-collection 
design. The spectrum of garbage-collection schema for linked structures distributed over a 
network are reviewed here. Distributed garbage collectors are classified first because they 
evolved from single-address-space collectors. This taxonomy is used as a framework to 
explore distribution issues: locality of action, communication overhead and indeterministic 
communication latency. 

Keywords: automatic storage reclamation, distributed, distributed file systems, distributed 
memories, distributed object-oriented management, memory management, network 
communication, object-oriented databases, reference counting 



1 2 Very concurrent mark-&-sweep garba g e collection without fine-grain synchronization 
Lorenz Huelsbergen, Phil Winterbottom 

October 1998 ACM SIGPLAN N toces , Proceedings off the ffirst international symposium 

on Memory management, volume 34 issue 3 
Full text available* I r£l odfd 36 MB) Additional Information: full citation , abstract , references , citings, index 
" " terms 

We describe a new incremental algorithm for the concurrent reclamation of a program's 
allocated, yet unreachable, data. Our algorithm is a variant of mark-& -sweep collection 
that— unlike prior designs— runs mutator, marker, and sweeper threads concurrently 
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without explicit fine-grain synchronization on shared-memory multiprocessors. A global, but 
infrequent, synchronization coordinates the per-object coloring marks used by the three 
threads; fine-grain synchronization is achieve ... 

13 Creating and preserving locality of iava applications at allocation and garbage 
collection times 

Yefim Shuf, Manish Gupta, Hubertus Franke, Andrew Appel, Jaswinder Pal Singh 
November 2002 ACM SIGPLAN Notices , Proceedings off the 17th ACM conference on 
Object-oriented programming, systems, languages, and applications, 

Volume 37 Issue 11 

Full text available: *Q pdf( 180 .20 KB) Additional Information: full citation , abstract , references , index terms 

The growing gap between processor and memory speeds is motivating the need for 
optimization strategies that improve data locality. A major challenge is to devise techniques 
suitable for pointer-intensive applications. This paper presents two techniques aimed at 
improving the memory behavior of pointer-intensive applications with dynamic memory 
allocation, such as those written in Java. First, we present an allocation time object 
placement technique based on the recently introduced notion of p ... 

Keywords: JVM, Java, garbage collection, heap traversal, locality, locality based graph 
traversal, memory allocation, memory management, object co-allocation, object placement, 
prolific types, run -time systems 



14 Contaminated garbage collection 

Dante J. Cannarozzi, Michael P. Plezbert, Ron K. Cytron 

May 2000 ACM SEGPLAW Notices , Proceedings of the ACM SXGPLAN 2000 conference 
on Programming language design and implementation, volume 35 issue 5 

Full text available' t S) df(559 20 KB) Additional Information: full citation , abstract , references , citings , index 
•TUP—* : terms 

We describe a new method for determining when an object can be garbage collected. The 
method does not require marking live objects. Instead, each object X is dynamically 
associated with a stack frame M, such that Xis collectable when M pops. Because X could 
have been dead earlier, our method is conservative. Our results demonstrate that the 
method nonetheless identifies a large percentag ... 

1 5 Concurrent replicating garbage collection 3 
James OToole, Scott Nettles 

July 1994 ACM S3EGPLAW Lisp Pointers , Proceedings off the 3L994 ACM conference on 

LISP and functional programming, volume vn issue 3 
Full text available* c 3i Ddf(91 9 87 KB) Additional Information: ful l citation , a bstract , references , citings, index 

" terms 

We have implemented a concurrent copying garbage collector that uses replicating garbage 
collection. In our design, the client can continuously access the heap during garbage 
collection. No low-level synchronization between the client and the garbage collector is 
required on individual object operations. The garbage collector replicates live heap objects 
and periodically synchronizes with the client to obtain the client's current root set and 
mutation log. An experimental implementation usi ... 

16 Reducing garbage collector cache misses 
Hans-J. Boehm 

October 2000 ACM SXGPLAN Notices , Proceedings off the second international 

symposium n Memory management volume 36 issue l 
Full text available: ^pdf(774.19 KB) Additional Information: full citation , abstract , citings , index terms 

Cache misses are currently a major factor in the cost of garbage collection, and we expect 
them to dominate in the future. Traditional garbage collection algorithms exhibit relatively 
little temporal locality; each live object in the heap is likely to be touched exactly once 
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during each garbage collection. We measure two techniques for dealing with this issue: 
prefetch-on-grey, and lazy sweeping. The first of these is new in this context. Lazy sweeping 
has been in common use for a decade. It ... 

17 Portable, unobtrusive garbage collection for multiprocessor systems 
Damien Doligez, Georges Gonthier 

February 1994 Proceedings off the 21st ACM SXGPLAft-SXGACT symposium on Principles 
of programming languages 

F II text available* 1 ^ Ddfd 37 MB) Additional Information: full citation , abstract , references , citings , index 
u e v .^_jl._U terms 

We describe and prove the correctness of a new concurrent mark-and-sweep garbage 
collection algorithm. This algorithm derives from the classical on-the-fly algorithm from 
Dijkstra et al. [9], A distinguishing feature of our algorithm is that it supports multiprocessor 
environments where the registers of running processes are not readily accessible, without 
imposing any overhead on the elementary operations of loading a register or reading or 
initializing a field. Furthermor ... 

18 Concurren t g arba g e collection using pro g ram slices on multithreaded processors 
Manoj Plakal, Charles N. Fischer 
October 2000 ACM SIGPLAN Notices , Proceedings of the second international 

symposium on Memory management, volume 36 issue l 
Full text available: Q pdf(957.62 KB) Additional Information: full citation , abstract , citings , index terms 

We investigate reference counting in the context of a multi-threaded architecture by 
exploiting two observations: (1) reference-counting can be performed by a transformed 
program slice of the mutator that isolates heap references, and (2) hardware trends indicate 
that microprocessors in the near future will be able to execute multiple concurrent threads 
on a single chip. We generate a reference-counting collector as a transformed program slice 
of an application and then execute this slice in ... 

19 Fast out-of-order processor simulation using memoization 
Eric Schnarr, James R. Larus 

October 1998 Proceedings of the eighth international conference on Architectural 

support for programming languages and operating systems, volume 32 , 33 

Issue 5,11 

Full text available* Ddfd 43 MB) Additional Information: full citation , abstract , references , citings, index 
•T2y-E— ^ terms 

Our new out-of-order processor simulatol; FastSim, uses two innovations to speed up 
simulation 8—15 times (vs. Wisconsin SimpleScalar) with no loss in simulation accuracy. 
First, FastSim uses speculative direct-execution to accelerate the functional emulation of 
speculatively executed program code. Second, it uses a variation on memoization— a well- 
known technique in programming language implementation— to cache microarchitecture 
states and the resulting simulator actions, and then "fast forw ... 

Keywords: direct-execution, memoization, out-of-order processor simulation 



20 Cache performance of fast-allocating programs 
Marcelo J. R. Gongalves, Andrew W. Appel 

October 1995 Proceedings of the seventh international conference on Functional 
programming languages and computer architecture 

Full text available: * Spdf(1.47 MB) Additional Information: full citation , references , citings , index terms 
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1 A non-fragmentin g non-moving, garbage collector 
Gustavo Rodriguez-Rivera, Michael Spertus, Charles Fiterman 

October 1998 ACM SIGPLAN Notices , Proceedings of the first international symposium 

on Memory management volume 34 issue 3 
Full text available: ^ pdf(750.52 KB) Additional Information: full citation , abstract , references , index terms 

One of the biggest disadvantages of non-moving collectors compared to moving collectors 
has been their limited ability to deal with memory fragmentation. In this paper, we describe 
two techniques to reduce fragmentation without the need for moving live data. The first 
technique reduces internal fragmentation in BiBoP (Big-Bag-of-Pages) like allocators. The 
second technique reduces external fragmentation using virtual memory culls available in most 
modern operating systems. It can also reduce the ... 

Keywords: conservative garbage collection, fragmentation, garbage collection, memory 
allocation, non-copying garbage collection 
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